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ABSTRACT 

Delay-coordinate reconstruction is a proven modeling 
strategy for building effective forecasts of nonlinear time 
series. The first step in this process is the estimation of 
good values for two parameters, the time delay and the 
embedding dimension. Many heuristics and strategies 
have been proposed in the literature for estimating these 
values. Few, if any, of these methods were developed 
with forecasting in mind, however, and their results are 
not optimal for that purpose. Even so, these heuristics— 
intended for other applications—are routinely used when 
building delay coordinate reconstruction-based forecast 
models. In this paper, we propose a new strategy for 
choosing optimal parameter values for forecast methods 
that are based on delay-coordinate reconstructions. The 
basic calculation involves maximizing the shared infor¬ 
mation between each delay vector and the future state of 
the system. We illustrate the effectiveness of this method 
on several synthetic and experimental systems, showing 
that this metric can be calculated quickly and reliably 
from a relatively short time series, and that it provides a 
direct indication of how well a near-neighbor based fore¬ 
casting method will work on a given delay reconstruction 
of that time series. This allows a practitioner to choose 
reconstruction parameters that avoid any pathologies, re¬ 
gardless of the underlying mechanism, and maximize the 
predictive information contained in the reconstruction. 

I. INTRODUCTION 

The method of delays is a well-established technique 
for reconstructing the state-space dynamics of a system 
from scalar time-series data^“^. The task of choosing 
good values for the free parameters in this procedure has 
been the subject of a large and active body of literature 
over the past few decades, e.g.,"^^^®. The majority of these 
techniques focus on the geometry of the reconstruction. 
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A standard method for selecting the delay r, for instance, 
is to maximize independence between the coordinates of 
the delay vector while minimizing overfolding and reduc¬ 
tion in causality between coordinates'^; a common way 
to choose an embedding dimension is to track changes in 
near-neighbor relationships in reconstructions of different 
dimensions^ b 

This heavy focus on the geometry of the delay recon¬ 
struction is appropriate when one is interested in quan¬ 
tities like fractal dimension and Lyapunov exponents, 
but it is not necessarily the best approach when one is 
building a delay reconstruction for the purposes of pre¬ 
diction. That issue, which is the focus of this paper, has 
received comparatively little attention in the extensive 
literature on delay reconstruction-based prediction^*'"^^. 
In the following section, we propose a robust, compu¬ 
tationally efficient method called SPI that can be used 
to select parameter values that maximize the shared in¬ 
formation between the past and the future—or, equiva¬ 
lently, that maximize the reduction in uncertainty about 
the future given the current model of the past. The im¬ 
plementation details, and a complexity analysis of the 
algorithm, are covered in Section III. In Section IV, we 
show that simple prediction methods working with SPI- 
optimal reconstructions—constructions using parameter 
values that follow from the SPI calculations—perform 
better, on both real and synthetic examples, than those 
same forecast methods working with reconstructions that 
are built using the traditional methods mentioned above. 
Finally, in Section V we explore the utility of SPI in the 
face of different data lengths and prediction horizons. 


II. SHARED INFORMATION AND DELAY 
RECONSTRUCTIONS 

The information shared between the past and the fu¬ 
ture is known as the excess entropy^^. We will de¬ 
note it here hy E = 101 where I is the mutual 
information^^ and and ^ represent the infinite past 
and the infinite future, respectively. E is often difficult to 
estimate from data due to the need to calculate statistics 
over potentially infinite random variables^"*. While this 
is possible in principle, it is too difficult in practice for all 
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but the simplest of dynamics^^. In any case, the excess 
entropy is not exactly what one needs for the purposes of 
prediction, since it is not realistic to expect to have the 
infinite past or to predict infinitely far into the future. 
For our purposes, it is more productive to consider the 
information contained in the recent past and determine 
how much that explains about the not-too-distant future. 
To that end, we define 

SPI = I[SfX^+j,] , 

where Sj is an estimate of the state of the system at 
time j and Xj^p is the state of the system p steps in the 
future. 

This can be neatly visualized—and compared to tra¬ 
ditional methods like time-delayed mutual information, 
multi-information and the so-called co-information^®— 
using the I-diagrams of Yeung^ b Figure 1 shows an I- 
diagram of time-delayed mutual information for a specific 
r. In a diagram like this, each circle represents the un¬ 
certainty in a particular variable. The left circle in Fig¬ 
ure 1, for instance, represents the average uncertainty in 
observing Xj-r (i.e., H[Xj-r], where H is the Shannon 
entropy^'^); similarly, the top circle represents H[Xj^p] 
or the uncertainty in the future observation. Each 
of the overlapping regions represents shared uncertainty: 
e.g., in Figure 1, the shaded region represents the shared 
uncertainty between Xj and More precisely, the 

shaded region schematizes the quantity 

I[Xf,X,_,] = H[X,] + H[X,_,] - H[X„X,_,] 

= H[X,]-H[X,\X,.r] 

= H[X,_r] - H[X,_r\X,]. 

If the X are trajectories in reconstructed state space, 
then tuning the reconstruction parameters (e.g., t) 
changes the size of the overlap regions—i.e., the amount 
of information shared between the coordinates of the de¬ 
lay vector. This notion can be put into practice to select 
good values for those parameters. Notice, for instance, 
that minimizing the shaded region in Figure 1 —that is, 
rendering Xj and Xj_T- as independent as possible— 
maximizes the total uncertainty that is explained by the 
combined model [Xj^Xj-rY" (the sum of the area of the 
two circles). This is precisely the argument made by 
Fraser and Swinney in®. However, it is easy to see from 
the I-diagram that choosing r in this way does not ex¬ 
plicitly take into account explanations of the future —that 
is, it does not reduce the uncertainty about More¬ 

over, the calculation does not extend to three or more 
variables, where minimizing overlap is not a trivial ex¬ 
tension of the reasoning captured in the I-diagrams. 

The obvious next step would be to explicitly include 
the future in the estimation procedure. One approach to 
this would be to work with the so-called co-information^®, 

C = I[Xj]Xj-T',Xjj.p] , 

As depicted in Figure 2a, this is the intersection of H[Xj], 
H[Xj-T] and H[Xj+p]. It describes the reduction in un- 
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FIG. I: An I-diagram of the time-delayed mutual 
information. The circles represent uncertainties (H) in 
different variables; the shaded region represents 
I[Xj; Xj-r], the time-delayed mutual information 
between the current state Xj and the state r time units 
in the past, Xj-r- Notice that the shaded region is 
indifferent to H[Xj^p], the uncertainty about the 
future. 

certainty that the two past states, together, provide re¬ 
garding the future. While this is obviously an improve¬ 
ment over the time-delayed mutual information of Fig¬ 
ure I, it does not take into account the information that 
is shared between Xj and the future but not shared with 
the past (i.e., Xj-r)-, and vice versa. The so-called multi¬ 
information, 

M=J2 iH[X^]) - i?[Y„ Y,_., Y,+p] , 

depicted in Figure 2b addresses this shortcoming, but 
it also includes information that is shared between the 
past and the present, but not with the future. This is 
not terribly useful for the purposes of prediction. More¬ 
over, the multi-information overweights information that 
is shared between all three circles—past, present, and 
future—thereby artificially over-valuing information that 
is shared in all delay coordinates. In the context of pre¬ 
dicting Yj_|_p, the provenance of the information is irrele¬ 
vant and so the multi-information seems ill-suited to the 
task at hand as well. 

SPI addresses all of the issues raised in the previous 
paragraphs. By treating the generic delay vector as a 
joint variable, rather than a series of single variables, 
SPI captures the shared information between the past, 
present, and future independently (the left and right col¬ 
ored wedges in Figure 2), as well as the information that 
the past and present, together, share with the future (the 
center wedge). By choosing delay reconstruction param¬ 
eters that maximize SPI, then, one can explicitly max¬ 
imize the amount of information that each delay vector 
contains about the future. 
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(b) An I-diagram of the multi-information, 

M[Xj, Xj-T--, Xj+p], The centermost region is more darkly 
shaded here to reflect the extra weight that that region 
carries in the calculation. 


To make all of this more concrete and tie it back to 
state-space prediction of dynamical systems, consider the 
following example: let Sj be a two-dimensional delay re¬ 
construction of the time series, Sj = [xj,Xj-r]'^ . In this 
case, SPI becomes which describes 

the reduction in uncertainty about the system at time 
j +p, given the state estimate [Xj.Xj-rY'. One can es¬ 
timate a r value for the purposes of reconstructing the 
dynamics from a given time series, for instance, by cal¬ 
culating SPI for a range of r and choosing the first max¬ 
imum (i.e., minimizing the uncertainty about the fu¬ 
ture observation). One can then apply any state-space 
forecasting method to the resulting reconstruction in or¬ 
der to predict the future course of that time series. In 
Section IV, we explore that claim using Lorenz’s classic 
method of analogues^ but it should be just as appli¬ 
cable for other predictors that utilize state-space recon¬ 
structions, such as the methods used 

Notice that both the definition of SPI and its use in 
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FIG. 2: An I-diagram of SPI, the quantity proposed in 
this paper: I^Xj,Xj-p]\Xj+p\. This quantity captures 
the shared information between the past, present, and 
future independently, as well as the information that 
the past and present, together, share with the future. 


optimizing forecast algorithms are general ideas that are 
easily extensible to other state estimators. For example, 
in the case of traditional delay-coordinate embedding^ the 
state estimator is the m-dimensional delay vector, i.e., 

Sj = [Xj, Xj-p ,..., 

with m chosen to meet the appropriate theoretical 
requirements b'b We demonstrate this approach in 
Section IV. If the time series is pre-processed (e.g., 
via a Kalman filter^' , a low-pass filter and an in¬ 
verse Fourier transform"*, or some other local linear 
transformation^’’^*^^®’^'* ), the state estimator simply be¬ 
comes Sj = Xj where Xj is the processed m-dimensional 
delay vector. As we demonstrate in Section IV B, one 
can even use SPI to optimize parameter choices for 
forecast methods that use reconstructions that are not 
embeddings—i.e., those whose dimensions do not meet 
the traditional requirements for preserving dynamical in¬ 
variants like the Lyapunov exponent. 


III. EFFICIENT ESTIMATION OF SPI 


To calculate SPI from a real-valued time series, one 
must first symbolize those data. Simple binning is not a 
good solution here, as it is known to cause severe bias if 
the bin boundaries do not create a generating partition"^. 
A useful alternative is kernel estimation*^’**, in which the 
relevant probability density functions are estimated via 
a function 0 with a resolution or bandwidth r that mea¬ 
sures the similarity between two points in A x V space. 
(For SPI, X would be Sj and Y would be Xj+p.) Given 


4 


points {x^,yi} and {a;-,?/-} in X x F, one can define: IV. APPLYING SPI TO SELECT RECONSTRUCTION 

PARAMETERS 


■Pr{Xi,yi) 




where Q{x > 0) = 0 and 0(a: < 0) = 1. That is, Pr{xi, yi) 
is the proportion of the N pairs of points in X x F space 
that fall within the kernel bandwidth r of {xi,yi}, i.e., 
the proportion of points similar to {xi,yi}. When | • | 
is the max norm, this is the so-called box kernel. This 
too, however, can introduce bias'^^ and is dependent on 
the choice of bandwidth r. After these estimates, and 
the analogous estimates for p{x), are produced, they are 
then used directly to compute local estimates of mutual 
information for each point in space, which are then aver¬ 
aged over all samples to produce the mutual information 
of the time series. For more details on this procedure, 
see . 

A better way to calculate /[A;F] and estimate SPI is 
the Kraskov-Stiigbauer-Grassberger (KSG) estimator^*^. 
This approach dynamically alters the kernel bandwidth 
to match the density of the data, thereby smoothing out 
errors in the probability density function estimation pro¬ 
cess. In this approach, one first finds the nearest 
neighbor for each sample {x, y} (using max norms to 
compute distances in x and y), then sets kernel widths 
Xj. and Ty accordingly and performs the pdf estimation. 
There are two algorithms for computing I\X\ F] with the 
KSG estimator*^. The first is more accurate for small 
sample sizes but more biased; the second is more accu¬ 
rate for larger sample sizes. We use the second of the 
two in this paper, as we have fairly long time series. Our 
algorithm sets and Xy to the x and y distances to the 
nearest neighbor. One then counts the number of 
neighbors within and on the boundaries of these kernels 
in each marginal space, calling these sums xi^ and riy, 
and finally calculates 


+ V'(«'y)) + i’in) , 


In this section, we demonstrate how to use SPI to 
choose parameter values for delay-reconstruction forecast 
models. We do this for several synthetic examples, as well 
as for sensor data from several laboratory experiments. 
For the discussion that follows, we use the term “SPI- 
optimal” to refer to the parameter values (m and t) that 
provided the best match between the forecast and the 
true continuation. 

To evaluate a forecast model, we divide the signal 
into two parts: the initial training signal {xj}^^i —the 
first n elements of the time series—and the test signal 
{cf}fcn+i ’ where k is the length of the prediction. We 
build a delay reconstruction from the xj (i.e., a sequence 
of points [xj,Xj-r, ■ ■ ■, use it to generate 

a prediction and then use the Mean Abso¬ 

lute Scaled Error'*"^ to compare the prediction to the test 
signal: 


Ai+n+l 

MASE^ E — 

£=n+l n—1 


\xe - C£l __ 

^j=2 \^3 ~ ^i-11 


MASE is a normalized measure: the scaling term in 
the denominator is the average in-sample forecast error 
for a random-walk prediction—which uses the previous 
value in the observed signal as the forecast—calculated 
over the training signal. That is, MASE < I means 
that the prediction error in question was, on the aver¬ 
age, smaller than the in-sample error of a random-walk 
forecast on the training portion of the same data. Anal¬ 
ogously, MASE > 1 means that the corresponding pre¬ 
diction method did woxse, on average, than the random- 
walk method. 

While its comparative nature may seem odd, this error 
metric allows for fair comparison across varying methods, 
prediction horizons, and signal scales, making it a stan¬ 
dard error measure in the forecasting literature—and a 
good choice for the study described in the following sec¬ 
tions, which involve a number of very different signals. 


where ip is the digamma function'^^. This estimator has 
been demonstrated to be robust to variations in k as long 
as fc > 4'*^. 

In this paper, we employ the Java Information Dy¬ 
namics Toolkit (JIDT) implementation of the KSG 
estimator'^^. The computational complexity of this im¬ 
plementation is 0{kN log N), where N is the length of 
the time series and k is the number of neighbors being 
used in the estimate. While this is more expensive than 
traditional binning {0{N)), it is bias corrected, allows 
for adaptive kernel bandwidth to adjust for under- and 
over-sampled regions of space, and is both model and 
parameter free (aside from k, to which it is very robust). 


A. Synthetic examples 

In this Section, we apply SPI to some standard syn¬ 
thetic examples, both maps (Henon, logistic) and flows: 
the classic Lorenz system'^® and the more-recent “Lorenz 
96” atmospheric model"*®. We construct the traces for 
the Lorenz experiments using a standard fourth-order 
Runge-Kutta solver on the associated differential equa¬ 
tions, with a timestep of ^, for 60,000 time steps. For the 
maps, we simply iterate the difference equations 60,000 
times. In all cases, we discard the first 10,000 points of 
each trajectory to remove transient behavior, then sam¬ 
ple individual state variables to produce different scalar 
time-series data sets. We reconstruct the dynamics from 
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those traces using different values of the dimension m and 
delay r and compute SPI for each of those reconstructed 
trajectories. We then use Lorenz’s classic method of ana¬ 
logues (LMA)^' to generate forecasts of each trace, com¬ 
pute their MASE scores as described above, and dis¬ 
cuss their relationships to the SPI values for the corre¬ 
sponding time series. For simplicity, in this initial discus¬ 
sion we perform a series of one-step-ahead predictions, 
rebuilding the model at each step. For the SPI calcu¬ 
lations, this means that we estimate I[Sj^ Xj^i], with 
Sj = [Xj,Xj_T,... . In Section VB we 

expand this discussion by increasing the prediction hori¬ 
zon; in Section V A, we consider the effects of the length 
of the traces. 


Flow examples 

The Lorenz 96 system^® is defined by a set of K dif¬ 
ferential equations in the state variables ... ^k- 



(a) SPI values for different delay reconstructions of a 
representative trace from the Lorenz 96 system with 
{K = 22,F = 5}. 


ik = (Cfc+1 - Cfc-2)(?/c-l) - ^k+ F 

for k = 1,..., AT, where A S K is a constant forcing 
term that is independent of k. In the following discus¬ 
sion we focus on two parameter sets, {K = 22, F = 5} 
and {K = 47, F = 5}, which produce low- and high¬ 
dimensional chaos, respectively. See’^' for an explanation 
of this model and the associated parameters. 

Figure 3a shows a heatmap of the SPI values for recon¬ 
structions of a representative trajectory from this system 
with {K = 22, F = 5}, for a range of m and r. Not 
surprisingly, this image reveals a strong dependency be¬ 
tween the values of the reconstruction parameters and 
the reduction in uncertainty about the near future that 
is provided by the reconstruction. Very low r values, for 
instance, produce delay vectors with highly redundant 
coordinates, but which provide substantial information 
about the future. As mentioned in the first section of 
this paper, standard heuristics only focus on minimiz¬ 
ing redundancy between coordinates and choose the t 
value that minimizes the mutual information between 
the first two coordinates in the delay vector. For this 
trajectory, the approach of Fraser & Swinney'’ yields 
T = 26, while standard dimension-estimation heuristics'^ 
suggest m = 8. The SPI value for a delay reconstruc¬ 
tion built with those parameter values is 3.463. This 
is not, however, the SPI-optimal reconstruction; choos¬ 
ing m = 2 and r = I, for instance, results in a higher 
value (SPI = 5.303)—i.e., signficantly more reduction 
in uncertainty about the future. This may be somewhat 
counter-intuitive, since each of the delay vectors in the 
SPI-optimal reconstruction spans far less of the data set 
and thus one would expect points in that space to contain 
less information about the future. Figure 3a suggests, 
however, that this in fact not the case; rather, that un¬ 
certainty increases with both dimension and time delay. 
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(b) MASE scores for LMA forecasts on different delay 
reconstructions of a representative trace of the Lorenz 96 
system with {K = 22, F = 5}. 

FIG. 3: The effects of reconstruction parameter values 
on SPI and forecast accuracy for the Lorenz 96 system 




The question at issue in this paper is whether that re¬ 
duction in uncertainty about the future correlates with 
improved accuracy of an LMA forecast built from that 
reconstruction. Since the SPI-optimal choices maximize 
the shared information between the state estimator and 
Xj^i, one would expect a delay reconstruction model 
built with those choices to afford LMA the best lever¬ 
age. To test that conjecture, we performed an exhaus¬ 
tive search with m = 2,..., 15 and r = 1,. •., 50. For 
each {m,r} pair, we used LMA to generate forecasts 
from the corresponding reconstruction, computed their 
MASE scores, and plotted the results in a heatmap sim¬ 
ilar to the one in Figure 3a. As one would expect, the 
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MASE and SPI heatmaps are generally antisymmetric. 
This antisymmetry breaks down somewhat for low m and 
high T, where the forecast accuracy is low even though 
the reconstruction contains a lot of information about the 
future. We suspect that this is due to a combination of 
overfolding (due to too-large values of r) and projection 
(low m). Even though each point in such a reconstuc- 
tion may contain a lot of information about the future, 
the false crossings created by this combination of effects 
pose problems for a near-neighbor forecast strategy like 
LMA. The improvement that occurs if one adds another 
dimension is consistent with this explanation. Notice, 
too, that this effect only occurs far from the maximum 
in the SPI surface—the area that is of interest if one is 
using SPI to choose parameter values for reconstruction 
models. 

In general, though, maximizing the redundancy be¬ 
tween the state estimator and the future does appear 
to minimize the resulting forecast error of LMA. Indeed, 
the maximum on the surface of Figure 3b (to = 2, t = 1) 
is exactly the minimum on the surface of Figure 3a. The 
accuracy of this forecast is more than five times higher 
{MASE = 0.0737) than that of a forecast constructed 
with the parameter values suggested by the standard 
heuristics (0.3787). Note that the optima of these sur¬ 
faces may be broad: i.e., there may be ranges of to and 
T for which SPI and MASE are optimal, and roughly 
constant. In these cases, it makes sense to choose the 
lowest TO on the plateau, since that minimizes computa¬ 
tional effort, data requirements, and noise effects; see'^® 
for a full discussion of this. 

While the results discussed in the previous paragraph 
do provide a preliminary validation of the claim that 
one can use SPI to select good parameter values for de¬ 
lay reconstruction-based forecast strategies, they only in¬ 
volve a single example system. Similar experiments on 
traces from the Lorenz 96 system with different param¬ 
eter values {K = 47, F = 5} show identical results— 
indeed, the heatmaps are visually indistinguishable from 
the ones in Figure 3. Figure 4 shows heatmaps of SPI and 
MASE for similar experiments on the classic “Lorenz 
63” system ’c 

X = a{y — x) 
y = x{p-z)-y 
z = xy — /3z 

with the typical chaotic parameter selections: p = 
28, a = 10, and (3 = 8/3. As in the Lorenz 96 case, 
the heatmaps are generally antisymmetric, confirming 
that maximizing SPI is roughly equivalent to minimizing 
MASE. Again, though, the antisymmetry is not perfect; 
for high T and low to, the effects of projecting an over¬ 
folded attractor cause false crossings that trip up LMA. 
As before, adding a dimension mitigates this effect by 
removing these false crossings. Both the Lorenz 63 and 
Lorenz 96 plots show a general decrease in predictability 
for large to and high r, with roughly hyperbolic equipo- 
tentials dividing the colored regions®'**. The locations and 



(a) SPI values for different delay reconstructions of a 
representative trace from the Lorenz 63 system. 
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T 

(b) MASE scores for LMA forecasts on different delay 
reconstructions of a representative trace of the Lorenz 63 
system. 

FIG. 4: The effects of reconstruction parameter values 
on SPI and forecast accuracy for the Lorenz 63 system 


heights of these equipotentials differs because the two 
signals are not equally easy to predict. This matter is 
discussed further at the end of this section. 

Numerical SPI and MASE values for LMA forecasts 
on different reconstructions of both Lorenz systems are 
tabulated in the top three rows of Table I, along with 
the reconstruction parameter values that produced those 
results. The data in this table bring out two impor¬ 
tant points. First, as suggested by the heatmaps, the 
TO and r values that maximize SPI (termed mspi and 
Tspi in the table legend) are close, or identical, to the 
values that minimize MASE {niE and Tp) for all three 
Lorenz systems. This is notable because—as discussed 
in Section V A— the former can be estimated quite reli¬ 
ably from a small sample of the trajectory in only a few 
seconds of compute time, whereas the exhaustive search 
that is involved in computing mp and Tp for Table I 
required close to 30 hours of CPU time per signal. A 
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TABLE I: MASS values for various delay reconstructions of the different examples studied here. MASEh is the 
representative accuracy of LMA forecasts that use delay reconstructions with parameter values {mspi and tspi) 
chosen via standard heuristics for the corresponding traces—the methods of false neighbors'^'^ and time-delayed 
mutual information'^, respectively. Similarly, MASEspi is the accuracy of LMA forecasts that use reconstructions 
built with the m and r values that maximize SPI, and MASEp is the error of the best forecasts for each case, found 
via exhaustive search over the m,r parameter space. on these signals the standard heuristics failed. 


Signal 


MASEh 

Th 

mu 

MASEspi 

XSPI 

mspi 

MASEe 

Te 

mE 

Lorenz-96 K = 

22 

0.3787 

26 

8 

0.0737 

1 

2 

0.0737 

1 

2 

Lorenz-96 K = 

47 

1.007 

31 

10 

0.1156 

1 

2 

0.1156 

1 

2 

Lorenz 63 


0.2215 

12 

5 

0.0509 

1 

3 

0.0506 

1 

2 

Henon Map 


** 

* * 

** 

3.814e-04 

1 

2 

3.814e-04 

1 

2 

Logistic Map 


** 

* * 

** 

1.680e-05 

1 

1 

1.680e-05 

1 

1 


second important point that is apparent from the Ta¬ 
ble is that delay reconstructions built using the tradi¬ 
tional heuristics—the values with the iL subscript—were 
comparatively ineffective for the purposes of LMA-based 
forecasting. This is notable because that is the default 
approach in the literature on state-space based forecast¬ 
ing methods for dynamical systems. 

A close comparison of Figures 3 and 4 brings up an¬ 
other important point: some time series are harder to 
forecast than others. Figure 5 breaks down the details 
of the two suites of Lorenz-96 experiments, showing the 
distribution of SPI and MASE values for all of the re¬ 
constructions. Although there is some overlap in the 
AT = 47 and K = 22 histograms—i.e., best-case forecasts 
of the former are better than most of the forecasts of the 
latter—the AT = 47 traces generally contain less infor¬ 
mation about the future and thus are harder to forecast 
accurately. 


Map examples 


Delay reconstruction of discrete-time dynamical sys¬ 
tems, while possible in theory, can be problematic in 
practice. Although the embedding theorems do apply 
in these cases, the heuristics for estimating m and r of¬ 
ten fail. The time-delayed mutual information of’, for 
example, may decay exponentially, without showing any 
clear minimum. And the lack of spatial continuity of 
the orbit of a map violates the underlying idea behind 
the method of''^. State space-based forecasting methods 
can, however, be very useful in generating predictions of 
trajectories from systems like this —if one has a recon¬ 
struction that is faithful to the true dynamics. 

In view of this, it would be particularly useful if one 
could use SPI to choose embedding parameter values for 
maps. This section explores that notion using two canon¬ 
ical examples, shown in the bottom two rows of Table 1. 


o 

s 

0) 

::3 

0) 


600 

500 

400 

300 

200 

100 

0 



5.5 


SPI 

(a) SPI 


I Lorenz 96 {K = 22, F = 5} 
|Lorenz 96 {K = 47, F = 5} 



0.2 0.4 0.6 0.8 1 

MASE 



(b) MASE 


FIG. 5: Histograms of SPI and MASE values for 
representative traces from the Lorenz 96 
{K = 22, A = 5} and {AT = 47, F = 5} systems for all 
{TO,r} values in Figures 3 and 4. 


For the Henon map, 

Xn-\-l — 1 Vn 

Vn+l — bXn 

with a = 1.4 and b = 0.3, the SPI-optimal parameter 
values were m = 2 and r = 1. As in the flow exam¬ 
ples, these were identical to the values that minimized 





















MASE. These parameter values make sense, of course; 
a first-return map of the x coordinate is effectively the 
Henon map, so [xj,Xj-i\ is a perfect state estimator (up 
to a scaling term). But in practice, of course, one rarely 
knows the underlying dynamics of the system that gen¬ 
erated a time series, so the fact that one can choose 
good reconstruction parameter values by maximizing SPI 
is notable—especially since standard heuristics for that 
purpose fail in this system. 

The same pattern holds for the logistic map, Xn+i = 
rxn{i — Xn), with r = 3.65: the SPI-optimal parameter 
values coincide with the minimum of the MASE surface. 
As in the Henon example, these values {m = 1 and r = 1) 
make complete sense, given the form of the map. But 
again, one does not always know the form of the system 
that generated a given time series. In the case of the 
logistic map, the standard heuristics fail, but SPI clearly 
indicates that one does not actually need to reconstruct 
these dynamics—rather, near-neighbor forecasting on the 
time series itself is the best approach. 


B. Selecting reconstruction parameters of experimental 
time series 

The results in the previous section provide a prelimi¬ 
nary verification of the conjecture that maximizing SPI 
minimizes forecast accuracy of LMA, for both maps and 
flows. While experiments with synthetic examples are 
useful, they do not call the really important aspect of 
that research question: whether SPI is a useful way to 
choose parameter values for delay reconstruction-based 
forecasting of real-world data, where the time series are 
noisy and perhaps short, and one does not know the di¬ 
mension of the underlying system—let alone its govern¬ 
ing equations. In this section, we turn our attention to 
that question using experimental data from two different 
dynamical systems: a far-infrared laser and a laboratory 
computer-performance experiment. 


A Far-Infrared Laser 

A canonical test case in the forecasting literature is 
the so-called “Dataset A” from the Santa Fe Institute 
prediction competition’®, which was gathered from a far- 
infrared laser. As in the synthetic examples in the pre¬ 
vious section, the SPI and AIASE heatmaps (Figure 6) 
are largely antisymmetric for this signal. Again, there is 
a band across the bottom of each image because of the 
combined effects of overfolding and projection. Note the 
resemblance between Figures 6 and 4: the latter resem¬ 
ble “smoothed” versions of the former. It is well known^® 
that the SFI A dataset is well described by the Lorenz 63 
system with some added noise, so this similarity is both 
unsurprising and reassuring. LMA forecasts using the 
SPI-optimal reconstruction of this trace were more ac¬ 
curate than similar forecasts using a reconstruction built 
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(a) SPI values for different delay reconstructions of SFI 
Dataset A. 



T 

(b) MASE scores for LMA forecasts on different delay 
reconstructions of SFI Dataset A. 

FIG. 6: The effects of reconstruction parameter values 
on SPI and forecast accuracy for “Dataset A” from the 
Santa Fe Institute time-series prediction competition. 


using traditional heuristics {MASEspi = 0.0592 versus 
MASEh = 0.0733) and only slightly worse than the op¬ 
timal value {MASEe = 0.0538). However, the values 
of {mspijTgpj} and are not identical for this 

signal. This is because the optima in the heatmaps in 
Figure 6 are bands, rather than unique points—as was 
the case in the synthetic examples in Section IV A. In 
a situation like this, a range of {to, t} values are sta¬ 
tistically indistinguishable, from the standpoint of the 
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forecast accurary afforded by the corresponding recon¬ 
struction. The values suggested by the SPI calculation 
{fnspi = 9 and rgpi = 1) and by the exhaustive search 
{niE = 7, te = 1) were all on this plateau^'*. Again, it 
appears that one can use SPI to choose good parame¬ 
ter values for delay reconstruction-based forecasting, but 
SFI A is only a single trace from a fairly simple system. 


Computer Performance Dynamics 

Laboratory experiments on computer performance dy¬ 
namics have shown that these high-dimensional nonlin¬ 
ear systems exhibit a range of interesting deterministic 
dynamical behaviors'* ^*^ 2 ^ Both hardware and software 
play roles in these dynamics; changing either one can 
cause bifurcations from periodic orbits to low- and high¬ 
dimensional chaos. This rich range of behavior makes 
computer performance dynamics an ideal final test case 
for this paper. 

Collecting observations of the performance of a run¬ 
ning computer requires some significant engineering. 
Basically, one programs the microprocessor’s onboard 
hardware performance monitor to observe the quanti¬ 
ties of interest, then stops the program execution at 
100,000-instruction intervals—the unit of time in these 
experiments—and reads off the contents of those regis¬ 
ters. Interested readers can find a detailed description of 
this custom measurement infrastructure in'*^d3^ gjg_ 
nals that are produced by this apparatus are scalar time- 
series measurements of system metrics like processor effi¬ 
ciency {e.g., IPC, which measures how many instructions 
are being executed, on the average, in each clock cycle) 
or memory usage {e.g., how often the processor had to 
access the main memory during the measurement inter¬ 
val). 

Here, for conciseness, we focus on processor perfor¬ 
mance traces from two different programs, one simple 
and one complex, running on the same Intel i7-based 
computer. The first is four lines of C (colunajor) that 
repeatedly initializes a 256 x 256 matrix in column-major 
order. The second is a much more complex program: the 
403.gcc compiler from the SPEC 2006CPU benchmark 
suite”*"*. The performance traces of these two programs 
contained 147,925 points and 45,545 points, respectively. 
Since computer performance dynamics result from a com¬ 
position of hardware and software, these two experiments 
involve two different dynamical systems, even though the 
programs are running on the same computer. But since 
other effects could be at work—housekeeping by the op¬ 
erating system, etc.—we repeated each experiment 15 
times for a total of 30 traces. We have performed similar 
forecast experiments using other processor and memory 
performance metrics gathered during the execution of a 
variety of programs on several different computers"*'^. Our 
preliminary analysis indicates that the results described 
in the rest of this section hold for those traces as well. 

As in the previous examples, heatmaps of MASE and 


SPI for the colunajor time series (Figure 7b) are largely 
antisymmetric. And again, reconstructions using the 



r 

(a) SPI values for different delay reconstructions of a 
coljnajor trace. 
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(b) MASE scores for LMA forecasts on different delay 
reconstructions of a coljnajor trace. 


FIG. 7: The effects of reconstruction parameter values 
on SPI and forecast accuracy for a representative trace 
from a computer-performance dynamics experiment 
tracing the processor load during the execution of a 
simple program that repeatedly initializes a matrix in 
column-major order. 


SPI-optimal parameter values allowed LMA to produce 
highly accurate forecasts of this signal: MASEspi = 
0.0509, compared to the optimal MASEp = 0.0496. 
There are several major differences between these plots 
and the previous ones in this paper, though, beginning 
with the vertical stripes. These are due to the dominant 
unstable periodic orbit of period 3 in the chaotic attractor 
in the coljnajor dynamics. When r is a multiple of this 
period (r = 3 k), the coordinates of the delay vector are 
not independent, which lowers SPI and makes forecast¬ 
ing more difficult. (There is a nice theoretical discussion 
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of this effect in'^.) Conversely, SPI spikes and MASE 
plummets when r = 3k — 1, since the coordinates in such 
a delay vector cannot share any prime factors with the 
period of the orbit. The band along the bottom of both 
images is, again, due to a combination of overfolding and 
projection. 

Another difference between the coljnajor heatmaps 
and the ones in Figures 3, 4, and 6 is the apparent over¬ 
all trend: the “good” regions (low MASE and high SPI) 
are in the lower-left quadrants of those heatmaps, but in 
the upper-right quadrant of Figure 7. This is partly an 
artifact of the difference in the color-map scale, which 
was chosen here to bring out some important details of 
the structure, and partly due to that structure itself. 
Specifically, the optima of the coljnajor heatmaps—the 
large dark red and blue regions in Figures 7a and 7b, 
respectively—are much broader than the ones in the ear¬ 
lier sections of this paper, perhaps because the signal is 
so close to periodic. (This was also the case to some ex¬ 
tent in the SFI A example, for the same reason.) This 
geometry makes precise comparisons of SPI-optimal and 
MAS'A-optimal parameter values somewhat problem¬ 
atic, as the exact optima on two almost-flat but slightly 
noisy landscapes may not be in the same place. In¬ 
deed, the SPI values at {mspi, tspi} and {to£:, te} were 
within a standard error across all 15 traces of coljnajor. 

And that brings up an interesting tradeoff. For prac¬ 
tical purposes, what one wants is {msppTspi} values 
that produce a MASE value that is close to the opti¬ 
mum MASEe- However, the algorithmic complexity of 
most nonlinear time-series analysis and prediction meth¬ 
ods scales badly with m. In cases where the SPI maxi¬ 
mum is broad, then, one might want to choose the lowest 
value of m on that plateau—or even a value that is on 
the shoulder of that plateau, if one needs to balance ef¬ 
ficiency over accuracy. Indeed, forecasts with m = 2 
appear to work surprisingly well for many nonlinear dy¬ 
namical systems, including the coljnajor data^®. Fixing 
m = 2 amounts to marginalizing the heatmaps in Fig¬ 
ure 7, which produces a cross section like the ones shown 
in Figure 8. The antisymmetry between SPI and MASE 
is quite apparent in these plots; the global maximum of 
the former coincides with the global minimum of the lat¬ 
ter, at T = 2. The average MASE score of coljnajor 
forecasts constructed with m = 2 and this r value is 
0.0649. This is not much lower than the overall optimum 
of 0.0496—a value from a forecast whose free parame¬ 
ters required almost six orders of magnitude more CPU 
time to compute. As an important aside: these results 
suggest that one could bypass even more of the computa¬ 
tional effort that is involved in delay reconstruction-based 
forecasting by simply working in two dimensions, i.e., by 
calculating SPI across a range of rs, rather than across 
a 2D {m,T} space. This approach is discussed further 
in'^8. 

The correspondence between MASE and SPI also 
holds true for other marginalizations: i.e., the minimum 
MASE and the maximum SPI occur at the same r value 



T 


(a) SPI values for delay reconstructions of the coljnajor 
traces with m = 2 and a range of values of r. 



(b) MASE scores for LMA forecasts of delay 
reconstructions of the coljnajor traces with m = 2 and a 
range of values of r. 


FIG. 8: MASE and SPI for LMA forecasts of m = 2 
delay reconstructions of all 15 coljnajor traces, plotted 
as a function of r. The blue dashed curves show the 
averages across all trials; the red dotted lines are that 
average ± the standard deviation. 


for all m-wise slices of the col jnaj or heatmaps, to within 
statistical fluctuations. The methods of' and ' ' , inciden¬ 
tally, suggest th = 2 and mn = 12 for these traces; the 
MASE of an LMA forecast on such a reconstruction is 
0.0530, which is somewhat better than the best result 
from the m = 2 marginalization, although still short of 
the overall optimum. The correspondence between th 
and TSPI is coincidence; for this particular signal, maxi¬ 
mizing the independence of the coordinates happened to 
maximize the information about the future contained in 
each delay vector. The m = 12 result is not coincidence— 
and quite interesting, in view of the fact that the m = 2 
forecast is so good. It is also surprising in view of the 
huge number of transistors—potential state variables— 
in a modern computer. As described in^^, however, the 
hardware and software constraints in these systems con¬ 
fine the dynamics to a much lower-dimensional manifold. 
All of these issues, and their relation to the task of pre¬ 
diction, are explored in more depth in'^*^. 

The coljnajor program is what is known in the 
computer-performance literature as a “micro-kernel” — 
a extremely simple example that is used in proof-of- 
concept testing. The fact that its dynamics are so 
rich speaks to the complexity of the hardware (and the 
hardware-software interactions) in modern computers; 
again, see^^’"^^ for a much deeper discussion of these is¬ 
sues. Modern computer programs are far more complex 
than this simple micro-kernel, of course, which begs the 
question: what does SPI tell us about the dynamics 
of truly complex systems like that—programs that the 
computer-performance community models as stochastic 
systems? 

For 403. gcc, the answer is, again, that SPI appears 
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to be an effective and efficient way to assess predictabil¬ 
ity. It has been shown that this time series shares little 
to no information with the future: i.e., that it cannot 
be predicted using delay reconstruction-based forecast¬ 
ing methods, regardless of r and in values. The experi¬ 
ments in^''’ required dozens of hours of CPU time to es¬ 
tablish that conclusion; SPI gives the same results in a 
few seconds, using much less data. The structure of the 
heatmaps for this experiment, which are shown in Fig¬ 
ure 9, is radically different. The patterns visible in the 
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(a) SPI values for different delay reconstructions of a 
403.gcc trace. 



(b) MASE scores for LMA forecasts on different delay 
reconstructions of a 403.gcc trace. 

FIG. 9: The effects of reconstruction parameter values 
on SPI and forecast accuracy for a representative trace 
from a computer-performance dynamics experiment 
using the 403. gcc benchmark 

previous MASE plots, and the antisymmetry between 
SPI and MASE plots, are absent from Figure 9, reflect¬ 
ing the lack of predictive content in this signal. Note, too, 
that the color maps are different in this Figure. This re¬ 
flects the much lower values of SPI for this signal: a max¬ 
imum SPI of 0.7722 for 403.gcc, compared to 5.3026 for 


Lorenz 96 with K = 22. Indeed, the MASE surface in 
Figure 9b never dips below 1.0^®. That is, regardless of 
parameter choice, LMA forecasts of 403. gcc are no bet¬ 
ter than simply using the prior value of this scalar time 
series as the prediction. The uniformly low SPI values in 
Figure 9a are an effective indicator of this—and, again, 
they can be calculated quickly, from a relatively small 
sample of the data. It is to that issue that we turn next. 


V. DATA REQUIREMENTS AND PREDICTION 
HORIZONS 

In some real-world situations, it may be impractical to 
rebuild forecast models at every step, as we have done in 
the previous sections of this paper—because of compu¬ 
tational expense, for instance, or because the data rate 
is very high. In these situations, one may wish to pre¬ 
dict p time steps into the future, then stop and rebuild 
the model to incorporate the p points that have arrived 
during that period, and repeat. In chaotic systems, of 
course, there are fundamental limits on prediction hori¬ 
zon even if one is working with infinitely long traces of 
all state variables. A key question at issue in this section 
is how that effect plays out in forecast models that use 
delay reconstructions from scalar time-series data. And 
since real-world data sets are not infinitely long, it is im¬ 
portant to understand the effects of data length on the 
estimation of SPI. 


A. Data Requirements for SPI Estimation 

The quantity of data used in a delay reconstruction 
directly impacts the usefulness of that reconstruction. 
If one is interested in approximating the correlation di¬ 
mension via the Grassberger-Procaccia algorithm, for 
instance, it has been shown that one needs iQC^+O-^m) 
data points^^’”*®. Those bounds are overly pessimistic for 
forecasting, however. For example, Sugihara & May^'’ 
used delay-coordinate reconstructions with m as large 
as seven to successfully forecast biological and epidemi¬ 
ological time-series data sets that contain as few as 266 
points. A key challenge, then, is to determine whether 
one’s time series really calls for as many dimensions and 
data points as the theoretical results require, or whether 
one can get away with fewer dimensions—and how much 
data one needs in order to figure all of that out. 

We claim that SPI is a useful solution to those chal¬ 
lenges. As established in the previous sections, calcu¬ 
lations of this quantity can reveal what dimension one 
needs to build a good delay reconstruction for the pur¬ 
poses of LMA forecasting of nonlinear and chaotic sys¬ 
tems. And, as alluded to in those sections, SPI can be 
estimated accurately from a surprisingly small number 
of points. The experiments in this section explore that 
intertwined pair of claims in more depth by increasing 
the length of the Lorenz 96 traces and testing whether 
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the information content of the state estimator derived 
from standard heuristics converges to the SPI-optimal 
estimator 

Figure 10 shows the results. When the data length 
is short, the m = 2 state estimator had the most infor¬ 
mation about the future. This makes perfect sense; a 
short time series cannot fully sample a complicated ob¬ 
ject, and when an ill-sampled high-dimensional manifold 
is projected into a low dimensional space, infrequently 
visited regions of that manifold can act effectively like 
noise. From an information-theoretic standpoint, this 
would increase the effective Shannon entropy rate of each 
of the variables in the delay vector. In the I-diagram in 
Figure 2, this would manifest as drifting apart of the two 
circles, decreasing the shaded region that one needs to 
maximize for effective forecasting. 

If that reasoning is correct, longer data lengths should 
fill out the attractor, thereby mitigating the spuri¬ 
ous increase in the Shannon entropy rate and allowing 
higher-dimensional reconstructions to outperform lower¬ 
dimensional ones. This is indeed what one sees in Fig¬ 
ure 10. For both the K = 22 and K = A7 traces, once 
the signal is 2 million points long, the four-dimensional 
estimator has caught up to and even exceeded the two- 
dimensional case. Note, though, that the optimal SPI 
of the m = 8 reconstruction model is still lower than in 
the m = 2 or 4 cases, even at the right-hand limit of 
the plots in Figure 10. That is, even with a time series 
that contains 4 x 10® points, it is more effective to use a 
lower dimensional reconstruction to make an LMA fore¬ 
cast. But the really important message here is that SPI 
allows one to determine the best reconstruction parame¬ 
ters for the available data, which is an important part of 
the answer to the challenges outlined at the beginning of 
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(a) Optimal SPI for traces from the {K = 22, F = 5} Lorenz 
96 system 
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(b) Optimal SPI for traces from the {K = 47, F = 5} Lorenz 
96 system 


FIG. 10: SPI versus data length for traces from the 
Lorenz-96 system using r = 1 in all cases. Blue circles 
corresponds to an embedding dimension m = 2, purple 
diamonds to m = 4, and red xs to m = 8. 


this section. 

Something very interesting happens in the m = 2 re¬ 
sults for Lorenz 96 model with K = A7: the SPI curve 
reaches a maximum value around 100,000 points and 
stops increasing, regardless of data length. What this 
means is that this two-dimensional reconstruction con¬ 
tains as much information about the future as can be 
ascertained from these data, suggesting that increasing 
the length of the training set would not improve forecast 
accuracy. To explore this, we constructed LMA forecasts 
of different-length traces (100,000-2.2 million points) 
from this system, then reconstructed their dynamics with 
different m values and the appropriate tspi for each 
case. For m = 2, both SPI and MASE results did 
indeed plateau at 100,000 points—at 5.736 and 0.0809, 
respectively. As before, more data does afford higher¬ 
dimensional reconstructions more traction on the predic¬ 
tion problem: the m = 4 forecast accuracy surpassed 
m = 2 at around 2 million points [MASE = 0.0521). In 
neither case, by the way, did m = 8 catch up to either 
m = 2 or TO = 4, even at 4 million data points. Of course, 
one must consider the cost of storing the additional vari¬ 
ables in a higher-dimensional reconstruction model, par¬ 
ticularly in data sets this long, so it may be worthwhile in 
practice to settle for the to = 2 forecast—which is only 
slightly less accurate and requires only 100,000 points. 
This has another major advantage as well. If the time 
series is non-stationary, a forecast strategy that requires 
fewer points can adapt more quickly. 


B. Choosing reconstruction parameters for increased 
prediction horizons. 

So far in this paper, we have considered forecasts that 
were constructed one step at a time and studied the corre¬ 
spondence of their accuracy with one-step-ahead calcula¬ 
tions of SPI. In this section, we consider longer prediction 
horizons [p] and explore whether one can use a p-step- 
ahead version of SPI—i.e., with p > 1—to 

choose parameter values that maximize the information 
contained in each delay vector about the value of the time 
series p steps in the future. 

One would expect the SPI-optimal {to, r} values for a 
given time series to depend on the prediction horizon. It 
has been shown, for instance, that longer-term forecasts 
generally do better with larger r®, and conversely®*. It 
makes sense that one might need to reach different dis¬ 
tances into the past (via the span of the delay vector) 
in order to reduce the uncertainty about events that are 
further into the future®®. These effects are corroborated 
by SPI. Figure 11 demonstrates this in the context of 
the Lorenz 96 system with K = 22, focusing on to = 2 
for simplicity. The topmost trace in this figure is for the 
p = 1 case—i.e., a horizontal slice of Figure 3a made at 
TO = 2. The maximum of this curve is the optimal r 
value [tspi) for this reconstruction. The overall shape 
of this trace reflects the monotonic increase in the uncer- 
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FIG. 11: The effect of prediction horizon (p) on SPI of 
the K = 22 Lorenz 96 system for a fixed reconstruction 
dimension {m = 2). The traces in the image, starting 
from the top, correspond to prediction horizons of p = 1 
to p = 100. 


tainty about the future with t that is noted on page 5. 
The other traces in Figure 11 show SPI as a function of 
T for p = 2,3,..., down to p = 100 at the bottom of the 
figure. The lower traces do not decrease monotonically; 
rather, there is a slight initial rise. This is due to the 
point made above about the span of the delay vector: if 
one is predicting further into the future, it may be useful 
to reach further into the past. In general, this causes the 
optimal r to shift to the right as prediction horizon in¬ 
creases, going down the plot—i.e., longer prediction hori¬ 
zons require a greater r (cf.*’). For very long horizons, 
the choice of r appears to matter very little. In particu¬ 
lar, SPI is fairly constant and quite low for 5 < r < 50 
when p > 30—i.e., regardless of the choice of r, there is 
very little information about the p-distant future in any 
delay reconstruction of this signal for p > 30. This effect 
should not be surprising, and it is well corroborated in 
the literature. However, it can be hard to know a priori, 
when one is confronted with a data set from an unknown 
system, to know what prediction horizon makes sense. 
SPI offers a computationally efficient way to answer that 
question. 

Figure 12 shows a similar exploration of the other side 
of that question: the effects of the reconstruction dimen¬ 
sion on SPI, with r fixed at 1. The m = 2 state estimator 
contains more information about the future for short pre¬ 



FIG. 12: The effect of prediction horizon (p) on SPI of 
the K = 22 Lorenz 96 system for a fixed time delay 
(r = 1) and two different reconstruction dimensions. 
The red line is m = 2 and the blue is rriH = 8, the value 
suggested for this signal by the technique of false 
neighbors. 


diction horizons. This ties back to the discussion at the 
end of Section IV B: low-dimensional reconstructions can 
work quite well for short prediction horizons. However, 
Figure 12 shows that the full reconstruction is better for 
longer horizons. This is not terribly surprising, since a 
higher reconstruction dimension allows the state estima¬ 
tor to capture more information about the past. Finally, 
note that SPI decreases monotonically with prediction 
horizon for both m = 2 and mn- This, too, is unsurpris¬ 
ing. Pesin’s relation^'* says that the sum of the positive 
Lyapunov exponents is equal to the entropy rate, and if 
there is a non-zero entropy rate, then generically obser¬ 
vations will become increasingly independent the further 
apart they are. This explanation also applies to Fig¬ 
ure 11, of course, but it does not hold for signals that are 
wholly (or nearly) periodic. 

Recall that the coljnajor dynamics in Section IV B 
were chaotic, but with a dominant unstable periodic 
orbit—which had a variety of interesting effects in the 
results. Figure 13 explores the effects of prediction hori¬ 
zon on those results. Not surprisingly, there is some peri¬ 
odicity in the SPI versus p relationships, but not for the 
same reasons that caused the stripes in Figure 7b. Here, 
the peaks in SPI occur at multiples of the period. That 
is, the m = 2 state estimator can forecast with the most 
success when the value being predicted is in phase with 
the delay vector. Note that this effect is far stronger for 
m = 2 than mn, simply because of the instability of that 
periodic orbit; the visits made by the chaotic trajectory 
to that orbit are more likely to be short than long. As ex¬ 
pected, SPI decays with prediction horizon—but only at 
first, after which it begins to rise again, peaking at p = 69 
and p = 71. This may be due to a second higher-order 
unstable periodic orbit in the col_major dynamics. 

In theory, one can derive rigorous bounds on predic¬ 
tion horizon. The time at which Sj will no longer have 
any information about the future can be determined by 
considering: 


R{p) 


I[Sj; Vj+p] 
H[Xj+p] 


i.e., the percentage of the uncertainty in Xj^p that can be 
reduced by the delay vector. Generically, this will limit to 
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FIG. 13: The effect of prediction horizon (p) on SPI of 
the coljnajor for a fixed time delay (r = 1) and two 
different reconstruction dimensions. The red line is 
m = 2 and the blue is mn = 12, the value suggested for 
this signal by the technique of false neighbors. 
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some small value equal to the amount of information that 
the delay vector contains about any arbitrary point on 
the attractor. Given some criteria regarding how much 
information above the “background” is required of the 
state estimator, one could use the R{p) versus p to de¬ 
termine the maximum practical horizon. 

In practice, one can select parameters for delay 
reconstruction-based forecasting by explicitly including 
the prediction horizon in the SPI function, fixing its value 
at the required value, performing the same search as we 
did in earlier sections over a range of m and r, and then 
choosing a point on (or near) the optimum of that SPI 
surface. The computational and data requirements of 
this calculation, as shown in Section V A, are far superior 
to those of the standard heuristics used in delay recon¬ 
structions. 


VI. CONCLUSION 

In this paper, we have described a new metric for 
quantifying how much information about the future is 
contained in a delay reconstruction. Using a number of 
different dynamical systems, we demonstrated a direct 
correspondence between the SPI value for different de¬ 
lay reconstructions and the accuracy of forecasts made 
with Lorenz’s method of analogues on those reconstruc¬ 
tions. Since SPI can be calculated quickly and reliably 
from a relatively small amount of data, without needing 
to know anything about the governing equations or the 
state space dynamics of the system, that correspondence 
is a major advantage, in that it allows one to choose 
parameter values for delay reconstruction-based forecast 
models without doing an exhaustive search on the param¬ 
eter space. Significantly, SPI-optimal reconstructions are 
better, for the purposes of forecasting, than reconstruc¬ 
tions constructed using standard heuristics like mutual 
information and the method of false neighbors, which can 
require large amounts of data, significant computational 
effort, and expert human interpretation. SPI allows us 
to answer other questions regarding forecasting with the- 
oreticaly unsound models'^®—e.g., why one can obtain 
a better forecast using a low-dimensional reconstruction 
than with a full embedding. It also allows one to un¬ 
derstand bounds on prediction horizon without having 
to estimate Lyapunov spectra or Shannon entropy rates, 
which are difficult to obtain for arbitrary real-valued time 
series. That, in turn, allows one to tailor one’s recon¬ 
struction parameters to the amount of available data and 
the desired prediction horizon—and to know if a given 
prediction task is just not possible. 

The explorations in this paper involve a simple near¬ 
neighbor forecast strategy and state estimators that are 
basic delay reconstructions of raw time-series data. The 
definition and calculation of SPI do not involve any as¬ 
sumptions about the state estimator, though, so the re¬ 
sults presented here should also hold for other state es¬ 
timators. For example, it is common in time-series pre¬ 


diction to pre-process one’s data: for example, low-pass 
filtering or interpolating to produce additional points. 
Calculating SPI after performing such an operation will 
accurately reflect the amount of information in that new 
time series—indeed, it would reveal if that pre-processing 
step destroyed information. And we believe that the 
basic conclusions in this paper extend to other state- 
space based forecast schemas besides Lorenz’s method 
of analogues, such as those used —although 

SPI may not accurately select optimal parameter values 
for strategies that involve post-processing the data (e.g., 
GHKSS ’'^). We are in the process of exploring this. 

There are many other interesting potential ways to 
leverage SPI in the practice of forecasting. If the SPI- 
optimal T = 1, that may be a signal that the time series 
is undersampling the dynamics and that one should in¬ 
crease the sample rate. One could use SPI at a finer 
grain to optimizing r individually for each dimension, 
as suggested in®^. To do this, one could define Sj = 
[Xj,Xj-TiiXj-r 2 .^ ■ ■ ■ ^^ 3 -T^-i] and then simply max¬ 
imize SPI using that state estimator constrained over 
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